refactor(cardano): shard, reorder, and merge the EWRAP boundary pipeline#978
refactor(cardano): shard, reorder, and merge the EWRAP boundary pipeline#978
Conversation
Partitions the epoch-boundary EWRAP work unit — previously one monolithic
work unit that materialised O(active_accounts) in memory (rewards map,
deltas, logs, applied_rewards) — into three phase-specific work units that
each commit independently:
* EwrapPrepare: global classification (pools/dreps/proposals), MIRs,
enactment + refund visitors for non-account entities, emits EpochEndInit
seeding EpochState.end with the prepare-time globals and zeroed reward
accumulators.
* EwrapShard(i): range-scoped (first-byte prefix bucket) load of pending
rewards + accounts, runs rewards + drops visitors per account, emits
EpochEndAccumulate with the shard's reward contribution.
* EwrapFinalize: reads the accumulated EpochState.end, emits EpochWrapUp
(which transitions rolling/pparams snapshots and clears ewrap_progress).
Cross-shard handoff piggy-backs on EpochState rather than a new entity:
ewrap_progress: Option<u32> is the durable cursor and EpochState.end
accumulates across shards via the new deltas.
WorkBuffer gains EwrapShardingBoundary{shard_index, total_shards} and
EwrapFinaliseBoundary states; pop_work now takes ewrap_total_shards from
CardanoConfig (default 16). EpochEndAccumulate has an idempotency guard
keyed on ewrap_progress so shard re-execution after a crash is safe.
Detection-only crash recovery at initialize time logs a warning when
ewrap_progress is set; full block-rehydration resume is flagged as TODO.
Memory tests in tests/memory.rs verify both fjall and redb3 honour
range-scoped iter_entities with O(1) heap — the load-bearing property for
the shard design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n ESTART Decouple two responsibilities that were tangled in EwrapPrepareWorkUnit: the global epoch-boundary entity processing (now plain `Ewrap`) and the structural opening of the `EpochState.end` slot (now done by ESTART's `EpochTransition`). Ewrap's `EpochEndInit` delta keeps its overwrite semantics; it now writes into a default-seeded slot rather than from None. Also adds `prev_end` / `prev_ewrap_progress` undo fields to `EpochTransition` (serialized, like the other prev_* fields) so a rollback after restart correctly restores them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-account leg of the epoch close was named after its position in the EWRAP pipeline; AccountShard names what it actually does — apply rewards and pool/drep delegator drops over a key-range slice of the account namespace. Also renames the related symbols (BoundaryWork::load_shard / commit_shard → load_account_shard / commit_account_shard, WorkBuffer::EwrapShardingBoundary → AccountShardingBoundary, InternalWorkUnit::EwrapShard → AccountShard). The user-facing `ewrap_total_shards` config field is intentionally preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…line The epoch-boundary sequence is now AccountShard ×N → Ewrap → EwrapFinalize (was Ewrap → AccountShard ×N → EwrapFinalize). Per-account work settles first; the global Ewrap phase then patches the prepare-time fields onto an EpochState.end that already has its reward accumulators populated. State machine: WorkBuffer::pop_work transitions reordered, and on_ewrap_boundary now takes ewrap_total_shards so the restart-at-boundary entry can construct AccountShardingBoundary directly. The total_shards == 0 defensive branch now skips to EwrapBoundary (global phase) instead of EwrapFinaliseBoundary. Delta semantics: - EpochEndInit::apply is now a PATCH — writes only the prepare-time fields (pool counts, epoch_incentives, MIR amounts, proposal refunds) and leaves the accumulator fields alone. ewrap_progress is no longer touched by this delta. Dropped the unused prev_ewrap_progress field. - EpochEndAccumulate::apply treats ewrap_progress = None as the natural starting state for shard 0 (unwrap_or(0) as the expected cursor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…delta The boundary close is now a single Ewrap work unit: it runs the global visitors AND emits EpochWrapUp carrying the assembled final EndStats (prepare-time fields combined with the AccountShard-populated accumulator fields). The wrap-up visitor now constructs the final stats locally instead of routing them through a separate EpochEndInit delta. Side-benefits: one fewer state-machine state, one fewer delta type, one fewer commit cycle. Atomicity also improves — the boundary close is now a single state-writer commit, so a crash between Ewrap and EwrapFinalize is no longer possible. Test fixture in tests/epoch_pots/main.rs restructured to match the post-reorder pipeline: accumulator reset gates on AccountShard shard_index == 0; rewards CSV is dumped on the Ewrap arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the boundary-pipeline reorder (AccountShard runs before Ewrap), the first epoch's AccountShard hits `EpochEndAccumulate::apply` with `entity.end == None` because Genesis bootstrapped the EpochState before ESTART's `EpochTransition` had a chance to seed the slot. Seed `end = Some(EndStats::default())` directly in Genesis to match the invariant ESTART maintains for every subsequent epoch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds an AShard work-unit phase that shards per-account epoch-boundary reward application by entity-key range, snapshots per-shard accumulators into epoch state, and refactors Ewrap to handle global MIR/refund/finalization after all shards commit. Changes
Sequence Diagram(s)sequenceDiagram
participant Boundary as WorkBuffer/Boundary
participant AShard as AShardWorkUnit
participant BWLoad as BoundaryWork::load_ashard
participant BWCommit as BoundaryWork::commit_ashard
participant Epoch as EpochState
participant Ewrap as EwrapWorkUnit
Boundary->>AShard: emit AShard(slot, shard_index)
AShard->>BWLoad: load_ashard(state, genesis, shard_index, total_shards, range)
BWLoad->>BWLoad: load pending rewards for range\niterate accounts, apply rewards+drops
BWLoad->>Epoch: snapshot shard accumulators\nemit EpochEndAccumulate delta
AShard->>BWCommit: commit_ashard(state, archive, range)
BWCommit->>Epoch: write drained logs, update ashard_progress
BWCommit->>Boundary: commit writers
Boundary->>Ewrap: after all shards -> emit Ewrap
Ewrap->>Ewrap: load_ewrap(), apply MIR/refunds, finalize EndStats
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pulls the AccountShard work unit out of `ewrap/` into a peer module `ashard/` (matching the layout of `estart/`, `rupd/`, `roll/`, `genesis/`). The shared `BoundaryWork` / `BoundaryVisitor` infrastructure and the drops visitor (used by both phases) stay in `ewrap/`; `ashard/` imports them. Moves: `rewards.rs`, `shard.rs`, `AccountShardWorkUnit` (from `work_unit.rs`), and the `load_*` / `commit_*` impl blocks. Visibility on shared `BoundaryWork` helpers (`new_empty`, `load_pool_data`, `load_drep_data`, `stream_and_apply_namespace`) widened from private to `pub(crate)`. The `ending_state` field also widened to `pub(crate)` so peer modules can mutate it (e.g. `wrapup.flush` already does this). Method/identifier renames to match the new module path: - `BoundaryWork::load_account_shard` → `load_ashard` - `BoundaryWork::commit_account_shard` → `commit_ashard` - `WorkUnit::name()` returns `"ashard"` Type name `AccountShardWorkUnit` is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mments Sweeps the docstrings/comments touched in this PR for references to phases, work units, and deltas that no longer exist after the rename / reorder / merge / split sequence: - Restore the in-place explanation for the "rewards before drops" HACK in `ashard/loading.rs` (the dangling "see comment on the pre-shard path" pointed to a comment that was deleted when the prepare phase was removed). - Drop "prepare phase" / "finalize phase" wording from `BoundaryWork` field docstrings, `commit_ewrap` comments, and `loading.rs` section dividers — neither phase exists; there's only Ewrap (global + close) and AccountShard (per-account). - Update the ESTART `EpochTransition` description in `work_units.md` so it reflects the post-merge data flow: AccountShards populate the accumulators directly, then Ewrap reads them back and emits `EpochWrapUp` with the final `EndStats` (no `EpochEndInit` patch step anymore). - Rename `compute_prepare_deltas` → `compute_ewrap_deltas`. The "prepare" name was a leftover from the `EwrapPrepare` work unit; the method is now the only Ewrap-phase compute helper. - Tighten `load_pending_rewards_range` docstring; flag that the `None` branch is currently unused. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shard-related identifiers and comments were named after the legacy EWRAP pipeline that bundled the global epoch-boundary work and the per-account shards together. With AccountShard now a distinct work unit in its own module, those names are misleading. Rename to use the `ashard` prefix consistently with the module path: - `CardanoConfig::ewrap_total_shards` → `ashard_total` - `CardanoConfig::DEFAULT_EWRAP_TOTAL_SHARDS` → `DEFAULT_ASHARD_TOTAL` - `EpochState::ewrap_progress` → `ashard_progress` - `prev_ewrap_progress` → `prev_ashard_progress` on `EpochEndAccumulate`, `EpochWrapUp`, and `EpochTransition` - `WorkBuffer::receive_block` / `on_ewrap_boundary` / `pop_work` parameter `ewrap_total_shards` → `ashard_total` - Error messages in `ashard/shard.rs` updated to match. Also fixes comment / doc misattributions where "EWRAP" was used for work that's now in `AccountShard`: - `PendingRewardState` / `DequeueReward` are consumed by `AccountShard`, not Ewrap. - `PendingMirState` / `DequeueMir` are consumed by Ewrap (clarified). - `AppliedReward` and the `applied_rewards` field are populated during AccountShard, not Ewrap. - RUPD's docstring now says rewards are consumed by `AccountShard`. - Crash-recovery wording in `lib.rs` says "mid-boundary" instead of "mid-EWRAP" since the cursor specifically tracks AccountShard progress. BREAKING CONFIG CHANGE: existing `dolos.toml` files that explicitly set `ewrap_total_shards` need to rename the key to `ashard_total`. Users relying on the default (omitted) are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns the type and variant names with the module path convention: - struct `AccountShardWorkUnit` → `AShardWorkUnit` - enum variant `CardanoWorkUnit::AccountShard` → `AShard` - enum variant `InternalWorkUnit::AccountShard` → `AShard` - WorkBuffer state `AccountShardingBoundary` → `AShardingBoundary` - module re-export and all callers updated to match - prose / docstrings / log messages also use `AShard` consistently The module path is `crate::ashard`, so the type now reads as `ashard::AShardWorkUnit`. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback: the user-facing config name should be self-explanatory in `dolos.toml`. Renames everywhere for consistency: - `CardanoConfig::ashard_total` field → `account_shards` - `CardanoConfig::ashard_total()` accessor → `account_shards()` - `CardanoConfig::DEFAULT_ASHARD_TOTAL` → `DEFAULT_ACCOUNT_SHARDS` - WorkBuffer parameters and error messages updated to match. BREAKING CONFIG CHANGE: existing `dolos.toml` files that explicitly set this option (under any prior name from this PR) need to use `account_shards`. Users relying on the default are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guards against a config change to `account_shards` corrupting an
in-flight boundary. Previously, if dolos crashed mid-boundary and the
operator changed `account_shards` between crash and restart, the resume
would re-partition the account key space with the new count, mismatching
the cursor's already-committed shards.
Fix: snapshot the boundary's shard count into state at the first
`EpochEndAccumulate` apply. The persisted total is authoritative for the
duration of the in-flight boundary; the new config value only takes
effect on the next boundary.
Changes:
- New `AShardProgress { committed, total }` struct stored at
`EpochState.ashard_progress: Option<AShardProgress>` (was
`Option<u32>`).
- `EpochEndAccumulate` carries `total_shards`. Its apply validates the
delta's `total_shards` matches any previously persisted total and
surfaces an error if they diverge (would only happen if a work unit
was constructed with a stale config view).
- `EpochWrapUp` and `EpochTransition` undo fields adapted to the new
type.
- `AShardWorkUnit::load` / `commit_state` read the persisted total when
present and fall back to `config.account_shards()` for fresh
boundaries.
- `CardanoLogic` caches `effective_account_shards` (= persisted total
if a boundary is in flight, else config). Refreshed at every
`pop_work` call so `receive_block` (which has no state access) can
use the up-to-date value when constructing
`WorkBuffer::AShardingBoundary`.
- Crash-recovery wording updated to surface a clear warning when the
persisted total disagrees with current config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/cardano/src/ashard/rewards.rs (1)
114-188:⚠️ Potential issue | 🟡 MinorLog message clarity: add shard context to per-shard telemetry.
The
flushmethod (lines 114–188) now runs once per AShard, but the telemetry messages ("rewards remaining before drain", "SPENDABLE REWARDS LEFT UNPROCESSED", etc.) don't indicate shard scope. Per-shard loading correctly isolates each shard's RewardMap viaload_pending_rewards_range(state, Some(range)), so the drain and error detection are functionally sound. However, for operational visibility, these log statements should include shard index and total_shards to clarify they reflect per-shard metrics, not boundary-wide state. This is especially helpful for debugging when multiple shards report metrics in parallel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ashard/rewards.rs` around lines 114 - 188, The per-shard logs in flush don't indicate which shard they belong to; update the flush logging to include shard context (shard_index and total_shards). Add shard_index: usize and total_shards: usize as fields on the AShard struct (or otherwise make them available on self or ctx), ensure any AShard construction/population passes these values, and then include %shard_index and %total_shards in the tracing::debug! and tracing::error! invocations (and the debug! at the end) around ctx.rewards/ drain_unspendable/ applied_reward_credentials so all per-shard telemetry shows shard scope.crates/cardano/src/model/epochs.rs (1)
980-981:⚠️ Potential issue | 🟡 MinorFix the
unused_doc_commentsrustdoc warning flagged by CI.Pipeline reports
rustdoc does not generate documentation for macro invocationson these lines. The///comment immediately above theprop_compose!macro invocation is dropped on the floor by rustdoc and triggers the lint. Convert to a regular//comment (or move it inside the closure body if you want it preserved).🛠️ Proposed fix
- /// `EpochStatsUpdate::apply` calls `rolling.live_mut` which asserts `next` is None, - /// so we need a specialized generator that keeps `rolling.next` empty. + // `EpochStatsUpdate::apply` calls `rolling.live_mut` which asserts `next` is None, + // so we need a specialized generator that keeps `rolling.next` empty. prop_compose! {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/model/epochs.rs` around lines 980 - 981, The rustdoc warning comes from using a /// doc comment immediately above the prop_compose! macro (related to EpochStatsUpdate::apply and rolling.live_mut which asserts rolling.next is None), so replace that leading /// comment with a non-doc comment (//) or move the explanation into the generator body/closure so rustdoc won't try to document the macro invocation; specifically update the comment above the prop_compose! invocation that references rolling.live_mut and rolling.next to use // (or relocate it inside the prop_compose! closure) to silence the unused_doc_comments lint.
🧹 Nitpick comments (12)
crates/cardano/src/ashard/rewards.rs (1)
56-77: StaleEWRAPreference in comment.The comment at lines 59–61 still says "registered at RUPD startStep time but deregistered before EWRAP". After this refactor the per-account registration check happens during
AShard, notEwrap. Suggest updating to "deregistered before AShard" (or "before the boundary's per-account leg") to keep the comment in lockstep with the renamed pipeline.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ashard/rewards.rs` around lines 56 - 77, Update the stale comment referencing "EWRAP" to reflect the new pipeline stage: change the text around the account deregistration explanation to say "deregistered before AShard" (or "before the boundary's per-account leg") so it matches where the per-account registration check now runs; the relevant area to edit is the comment above the account.is_registered() branch in the function that calls reward.total_value() and ctx.rewards.return_reward_to_treasury(total). Ensure the updated comment keeps the existing clarification about pre-filtered accounts and returned_rewards behavior.crates/cardano/src/ashard/loading.rs (1)
19-57: Optional: drop theOption<Range<EntityKey>>wrapper.The doc-comment already states
load_ashardis the only caller and always passesSome(range). Making the parameterRange<EntityKey>directly removes a never-taken branch and tightens the contract. If a future "load all" caller appears, reintroducing the option is trivial.♻️ Suggested refactor
- fn load_pending_rewards_range<D: Domain>( - &mut self, - state: &D::State, - range: Option<Range<EntityKey>>, - ) -> Result<(), ChainError> { - let pending_iter = state - .iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, range)?; + fn load_pending_rewards_range<D: Domain>( + &mut self, + state: &D::State, + range: Range<EntityKey>, + ) -> Result<(), ChainError> { + let pending_iter = state + .iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, Some(range))?;And drop the
Some(range.clone())at the call site (line 81) to passrange.clone()directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ashard/loading.rs` around lines 19 - 57, The function load_pending_rewards_range currently accepts Option<Range<EntityKey>> but the only caller (load_ashard) always passes Some(range), so tighten the signature to take Range<EntityKey> directly: change the parameter type in load_pending_rewards_range and update callers (e.g., in load_ashard remove Some(range.clone()) and pass range.clone() directly); ensure the body drops any pattern-matching for None (no-op) and uses the range value when calling state.iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, range).tests/epoch_pots/main.rs (1)
500-513: Optional: collapse the twoif let Some(boundary)checks.The shard-index==0 reset and the per-shard extend can share a single
boundarybinding, which trims a branch and makes the relationship between "capture ending epoch" and "extend rewards" obvious.♻️ Suggested refactor
CardanoWorkUnit::AShard(shard) => { - if shard.shard_index() == 0 { - // First shard of this boundary — reset accumulator - // and capture the ending epoch. - accumulated_applied.clear(); - if let Some(boundary) = shard.boundary() { - accumulated_ending_epoch = - Some(boundary.ending_state().number); - } - } - if let Some(boundary) = shard.boundary() { + if let Some(boundary) = shard.boundary() { + if shard.shard_index() == 0 { + // First shard of this boundary — reset + // accumulator and capture the ending epoch. + accumulated_applied.clear(); + accumulated_ending_epoch = + Some(boundary.ending_state().number); + } accumulated_applied.extend(boundary.applied_rewards.iter().cloned()); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/epoch_pots/main.rs` around lines 500 - 513, Collapse the two separate `if let Some(boundary)` checks into a single binding for `boundary` when matching `CardanoWorkUnit::AShard(shard)`: inside one `if let Some(boundary) = shard.boundary()` block first check `if shard.shard_index() == 0` to clear `accumulated_applied` and set `accumulated_ending_epoch = Some(boundary.ending_state().number)`, then unconditionally call `accumulated_applied.extend(boundary.applied_rewards.iter().cloned())`; this removes the duplicated boundary lookup and makes the reset-and-extend logic contiguous.crates/cardano/work_units.md (1)
5-11: Add a language to the fenced block (MD040).
markdownlint-cli2flagged this fence as missing a language. Usetext(or any non-rendered identifier) so the diagram passes lint.📝 Suggested fix
-``` +```text Estart → Roll … → Rupd → Roll … → AShard ×N → Ewrap (open) (blocks) (RUPD) (blocks) (per-account) (global + close) │ ▼ next epoch's Estart</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@crates/cardano/work_units.mdaround lines 5 - 11, The fenced code block
containing the epoch diagram starting with "Estart → Roll … → Rupd → Roll
… → AShard ×N → Ewrap" is missing a language tag (MD040); fix it by adding a
language identifier (use "text") immediately after the opening triple backticks
so the block becomes ```text and the diagram passes markdownlint-cli2
validation.</details> </blockquote></details> <details> <summary>crates/cardano/src/lib.rs (2)</summary><blockquote> `284-341`: **Consolidate duplicated `effective_account_shards` lookup logic.** The inline computation at lines 338–341 is identical to the new helper `read_effective_account_shards` (lines 253–258). On top of that, `load_epoch::<D>(state)` is invoked twice within `initialize` (line 292 for the warnings, line 338 to derive `effective_account_shards`). A small refactor avoids the double read and keeps the “persisted total else config” rule in one place. <details> <summary>♻️ Proposed refactor</summary> Lift the helper to a free fn and reuse it (and reuse the loaded `EpochState` for the warnings): ```diff - /// Compute the effective `account_shards` value: stored - /// `ashard_progress.total` if a boundary is in flight, otherwise the - /// configured value. - fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 { - load_epoch::<D>(state) - .ok() - .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total)) - .unwrap_or_else(|| self.config.account_shards()) - } + /// Compute the effective `account_shards` value: stored + /// `ashard_progress.total` if a boundary is in flight, otherwise the + /// configured value. + fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 { + effective_account_shards_from_state::<D>(state, &self.config) + } +} + +fn effective_account_shards_from_state<D: Domain>(state: &D::State, config: &CardanoConfig) -> u32 { + load_epoch::<D>(state) + .ok() + .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total)) + .unwrap_or_else(|| config.account_shards()) } ``` And in `initialize`, compute once and reuse the loaded epoch for both the warnings and the cached value. </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 284 - 341, Extract and reuse the epoch load and the effective-account-shards logic: in initialize(), call load_epoch::<D>(state) once and store the resulting EpochState (if Ok) for the warnings; replace the inline computation of effective_account_shards with a call to the existing helper read_effective_account_shards (or lift that helper to a free fn if necessary) so the "persisted total else config" rule is implemented in one place; update references to use the single loaded epoch value for the tracing::warn! blocks and derive effective_account_shards from read_effective_account_shards(state, &config) (or equivalent) to remove the duplicate load_epoch::<D>(state) call. ``` </details> --- `404-408`: **Per-`pop_work` state read may be unnecessary outside boundaries.** `effective_account_shards` is refreshed on every `pop_work` call (each block cycle) via a `read_entity_typed` lookup, but it only matters at epoch boundaries. The cost is small (single point read), but you could skip the read when the cached value is non-stale by only refreshing when `ashard_progress` could have changed (i.e., before an actual boundary handoff). Worth considering if `pop_work` shows up in profiling. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 404 - 408, The code refreshes self.effective_account_shards on every pop_work by calling read_effective_account_shards::<D>(domain.state()), but this value only matters when the epoch boundary handoff (EpochState.ashard_progress.total) can change; avoid the repeated read by first checking whether ashard_progress has changed and only calling read_effective_account_shards when it has. Add a small cached marker on the struct (e.g., last_ashard_total or last_ashard_progress) and in pop_work compare domain.state().ashard_progress.total (or the appropriate accessor) to that cached value; if different, call read_effective_account_shards and update both effective_account_shards and the cached marker, otherwise skip the read and keep the existing effective_account_shards. ``` </details> </blockquote></details> <details> <summary>crates/cardano/src/ewrap/commit.rs (1)</summary><blockquote> `91-100`: **Full-table account scan to apply a handful of MIR rewards.** `stream_and_apply_namespace::<D, AccountState>(state, &writer, None)` walks every account in state to surface the small set with MIR-queued deltas. The inline comment acknowledges this is "effectively a targeted write via the streaming path", but on mainnet-sized account tables this is a significant cost paid every boundary, regardless of how few MIRs are pending (often zero). Consider a direct path: iterate `self.applied_mir_credentials` (or the credentials of the queued `AssignRewards` deltas), point-read each `AccountState`, apply, and write. The streaming helper can stay for the full-set namespaces (Pool/DRep/Proposal/EpochState), but accounts during Ewrap are inherently sparse. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ewrap/commit.rs` around lines 91 - 100, The current call to stream_and_apply_namespace::<D, AccountState>(state, &writer, None) performs a full-table scan of AccountState each boundary; instead implement a targeted path that iterates over self.applied_mir_credentials (or the credentials from queued AssignRewards deltas), point-reads each AccountState, applies the MIR delta, and writes the result via the same writer so only affected accounts are touched; keep stream_and_apply_namespace for full-set namespaces like Pool/DRep/Proposal/EpochState, and ensure the new code reuses the existing apply/serialize logic used by stream_and_apply_namespace to maintain invariants and error handling. ``` </details> </blockquote></details> <details> <summary>crates/cardano/src/ashard/work_unit.rs (1)</summary><blockquote> `62-129`: **Avoid recomputing `total_shards`/`range` from state in `commit_state`.** `load` computes `total_shards` and `range` from persisted state, then calls `BoundaryWork::load_ashard(..., shard_index, total_shards, range)` — so the boundary already owns these values. `commit_state` repeats the same state read and recomputation, with two minor concerns: 1. **Code duplication**: identical 8-line block in both methods (and a third copy in `crates/cardano/src/lib.rs`). 2. **Subtle drift risk**: if `commit_state` ever observed a different `ashard_progress.total` than `load` did (e.g., a shard's apply landing between the two phases of the same work unit), the recomputed `range` would diverge from what was loaded. Today this can't happen, but cross-phase reuse of the captured values is more obviously correct. Consider exposing the captured shard info from `BoundaryWork` (or storing the computed `range` in `AShardWorkUnit` after `load`) and have `commit_state` reuse it instead of re-deriving. <details> <summary>♻️ Sketch</summary> ```diff pub struct AShardWorkUnit { slot: BlockSlot, config: CardanoConfig, genesis: Arc<Genesis>, shard_index: u32, - boundary: Option<BoundaryWork>, + boundary: Option<BoundaryWork>, + /// Captured during `load` so `commit_state` reuses the exact same range. + range: Option<std::ops::Range<dolos_core::EntityKey>>, } ``` Then `load` stores `self.range = Some(range.clone())` and `commit_state` consumes it. </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ashard/work_unit.rs` around lines 62 - 129, The load/commit_state duplication and drift risk can be fixed by capturing the computed total_shards and range during load and reusing them in commit_state instead of re-reading state: after computing total_shards and range in load (before calling BoundaryWork::load_ashard), store the values on the work unit (e.g., add fields self.total_shards and/or self.range and assign them there), and then change commit_state to take the boundary via self.boundary.as_mut() and use the stored range/total_shards when calling boundary.commit_ashard::<D>(...) rather than recomputing via load_epoch; update BoundaryWork usage only to consume the already-stored range if needed. ``` </details> </blockquote></details> <details> <summary>crates/cardano/src/work.rs (2)</summary><blockquote> `276-279`: **Test `TEST_TOTAL_SHARDS = 1` doesn't cover multi-shard sequencing in this state machine.** These tests collapse `AShard` into the `EWrap` tag and set `TEST_TOTAL_SHARDS = 1`, so the multi-shard loop in `pop_work` (lines 226–245) — including the `next_index >= total_shards` transition — is exercised only with `total_shards = 1`. Worth adding at least one test with `TEST_TOTAL_SHARDS > 1` that asserts the work buffer emits exactly N `AShard` units before transitioning to `EwrapBoundary` (and that `last_point_seen` / cursor invariants hold across that transition). This catches off-by-one and ordering regressions in the boundary state machine that the current tests would silently miss. Want me to draft this test? Also applies to: 325-330 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 276 - 279, Add a new unit test that sets TEST_TOTAL_SHARDS > 1 (instead of the current const TEST_TOTAL_SHARDS = 1) to exercise the multi-shard sequencing in the pop_work logic: drive the WorkBuffer/state machine (invoking pop_work) and assert that it emits exactly N consecutive AShard units (N == TEST_TOTAL_SHARDS) before producing an EwrapBoundary (or EWrap/Ewrap as used in this code), and verify that last_point_seen and any cursor invariants are preserved across the transition; target the test to exercise the pop_work path that contains the next_index >= total_shards branch and use the existing helpers/setup used by the other tests so it integrates with the same state machine and coverage. ``` </details> --- `120-143`: **`account_shards == 0` "skip-shards" fallback contradicts `validate_total_shards`.** `crates/cardano/src/ashard/shard.rs::validate_total_shards` rejects `0` outright ("account_shards must be >= 1"), yet both `on_ewrap_boundary` and the `PreEwrapBoundary` arm of `pop_work` treat `0` as "no shards — go straight to Ewrap". If a config bug or missing validation ever hands `0` here, you'll silently skip per-account boundary processing instead of failing fast. The defensive comment ("Shouldn't happen with a valid config") tacitly acknowledges this. Either drop the branches and let an `unreachable!()` / `assert!(account_shards >= 1)` fail loudly, or change `validate_total_shards` to accept `0` as "no shards" if that's the intended semantics. Right now the two pieces disagree. Also applies to: 210-225 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 120 - 143, The code currently treats account_shards == 0 as a valid "no shards" path in on_ewrap_boundary (and similarly in the PreEwrapBoundary arm of pop_work), which contradicts validate_total_shards that requires account_shards >= 1; change the behavior to fail-fast: remove the special-case branch that returns EwrapBoundary when account_shards == 0 and instead assert or panic if account_shards < 1 (e.g., add assert!(account_shards >= 1) at the start of on_ewrap_boundary), keeping the normal Restart -> AShardingBoundary path for valid shard counts; alternatively, if the intended semantics are that 0 means "no shards", update validate_total_shards to accept 0 and document that invariant—pick one consistent approach and apply the same fix in the PreEwrapBoundary/pop_work code paths so both agree with validate_total_shards. ``` </details> </blockquote></details> <details> <summary>crates/cardano/src/ashard/shard.rs (1)</summary><blockquote> `37-60`: **`debug_assert!` makes invalid `total_shards` a runtime div-by-zero in release.** The contract is documented and there's a `validate_total_shards` helper, but the only enforcement inside `shard_key_range` is `debug_assert!`. In release builds, `total_shards == 0` reaches `let step = PREFIX_SPACE / total_shards;` and panics with a less informative message; non-divisor values silently produce truncated `step` arithmetic. Since this is a public function and the cost of a real assertion (or returning `Result`) is negligible compared to the I/O the caller does, prefer `assert!` (or propagate the error from `validate_total_shards`). <details> <summary>🛡️ Proposed change</summary> ```diff pub fn shard_key_range(shard_index: u32, total_shards: u32) -> Range<EntityKey> { - debug_assert!(validate_total_shards(total_shards).is_ok()); - debug_assert!(shard_index < total_shards); + assert!( + validate_total_shards(total_shards).is_ok(), + "invalid total_shards: {total_shards}" + ); + assert!(shard_index < total_shards, "shard_index {shard_index} >= total_shards {total_shards}"); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ashard/shard.rs` around lines 37 - 60, The function shard_key_range uses debug_assert! to check validate_total_shards(total_shards) and shard_index < total_shards, which leaves a division-by-zero and silent truncation in release; change the debug_assert! checks to real runtime checks (e.g., assert!(validate_total_shards(total_shards).is_ok()) and assert!(shard_index < total_shards)) or convert the function to return Result and propagate validate_total_shards() error from shard_key_range so PREFIX_SPACE / total_shards cannot divide by zero and invalid total_shards are handled deterministically; update callers if you choose the Result approach and keep references to shard_key_range, validate_total_shards, and PREFIX_SPACE when making the change. ``` </details> </blockquote></details> <details> <summary>crates/cardano/src/ewrap/loading.rs (1)</summary><blockquote> `280-285`: **`shard_applied_*` fields are dead state on Ewrap-constructed `BoundaryWork`.** `new_empty` zero-initializes `shard_applied_effective`, `shard_applied_unspendable_to_treasury`, and `shard_applied_unspendable_to_reserves`, but Ewrap never reads/writes them — the per-shard accumulators land in `EpochState.end` via `EpochEndAccumulate` and are read from `ending_state.end` by `wrapup.flush`. The fields exist only because `BoundaryWork` is shared with the AShard path. Not a bug, but the dual-use API obscures intent. If a future change adds a third caller, it's easy to misuse these fields. Consider either splitting `BoundaryWork` into `AShardWork`/`EwrapWork` or documenting on the struct which fields each phase owns. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/ewrap/loading.rs` around lines 280 - 285, BoundaryWork's shard_applied_effective / shard_applied_unspendable_to_treasury / shard_applied_unspendable_to_reserves are dead for the Ewrap path (zeroed in new_empty and never read by Ewrap; real per-shard accumulators live in EpochState.end via EpochEndAccumulate and are consumed by wrapup.flush), so fix by either splitting the dual-purpose struct into two clear types (e.g., AShardWork and EwrapWork) and move the AShard-only shard_applied_* fields into AShardWork, or add explicit documentation/comments on BoundaryWork and those three fields noting they are owned/used only by the AShard path; update new_empty, any constructors, and usages of BoundaryWork (including references in EpochEndAccumulate and wrapup.flush) to use the new types or to reflect the documented ownership to prevent future misuse. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@crates/cardano/src/model/epochs.rs:
- Around line 737-740: The apply path currently does
entity.end.as_mut().expect("ESTART seeded EpochState.end before shards run")
which panics if EpochState.end is None; instead ensure end is initialized to
EndStats::default() when missing (so subsequent EpochWrapUp can overwrite it).
Update the apply code in crates/cardano/src/model/epochs.rs to replace the
expect with a safe initialization (e.g., use get_or_insert_with / get_or_insert
or similar) so entity.end becomes Some(EndStats::default()) if it was None
before mutating it.- Around line 684-771: EpochEndAccumulate::undo currently assumes apply always
mutated state, causing underflow/corruption when apply returned early; fix by
recording in apply whether it performed mutations and the prior values needed to
revert (capture previous end.effective_rewards, end.unspendable_to_treasury,
end.unspendable_to_reserves and previous entity.ashard_progress/AShardProgress)
into fields on EpochEndAccumulate (or a was_applied flag plus prev_* fields),
then make undo a no-op if apply was skipped or restore those captured prev_*
values and previous ashard_progress instead of unconditionally subtracting and
setting committed/total; update apply (function EpochEndAccumulate::apply) to
populate these fields only when it actually mutates state and add a proptest
exercising apply→undo when ashard_progress is already advanced past
completed_shard_index to lock the behavior.In
@crates/core/src/config.rs:
- Around line 988-1002: Add runtime validation to reject invalid account_shards
values by calling the existing validate_total_shards() from
crates/cardano/src/ashard/shard.rs during config initialization: when
constructing/using CardanoConfig (e.g., inside CardanoLogic::initialize() before
any use of CardanoConfig::account_shards() or DEFAULT_ACCOUNT_SHARDS), call
validate_total_shards(self.account_shards.unwrap_or(Self::DEFAULT_ACCOUNT_SHARDS))
and return/propagate an error if it fails; ensure the initialization path stops
the startup with a clear error rather than allowing invalid values (0 or
non-divisors of 256) to be used.In
@skills/debug-epoch-mismatch/SKILL.md:
- Line 138: The documentation still treats reward application as a single EWRAP
work unit but the code split moves per-account reward application into AShard
while Ewrap now handles MIR/global closure; update the tables that list or
reference EWRAP (notably the "Classify the Mismatch", "Dolos Work Units →
Haskell Concepts", and the instrumentation hypothesis table) to distinguish
AShard vs Ewrap: point applyRUpd + reward filtering, unspendable routing and
registration filtering to AShard and point MIR and boundary close to Ewrap, and
adjust any bisection/debugging guidance so it routes to AShard for per-account
reward issues and to Ewrap for MIR/global closure issues. Ensure references
include the file/class names AShard, Ewrap and the function/concept names
applyRUpd, MIR, reward filtering, boundary close to make locating code
straightforward.
Outside diff comments:
In@crates/cardano/src/ashard/rewards.rs:
- Around line 114-188: The per-shard logs in flush don't indicate which shard
they belong to; update the flush logging to include shard context (shard_index
and total_shards). Add shard_index: usize and total_shards: usize as fields on
the AShard struct (or otherwise make them available on self or ctx), ensure any
AShard construction/population passes these values, and then include
%shard_index and %total_shards in the tracing::debug! and tracing::error!
invocations (and the debug! at the end) around ctx.rewards/ drain_unspendable/
applied_reward_credentials so all per-shard telemetry shows shard scope.In
@crates/cardano/src/model/epochs.rs:
- Around line 980-981: The rustdoc warning comes from using a /// doc comment
immediately above the prop_compose! macro (related to EpochStatsUpdate::apply
and rolling.live_mut which asserts rolling.next is None), so replace that
leading /// comment with a non-doc comment (//) or move the explanation into the
generator body/closure so rustdoc won't try to document the macro invocation;
specifically update the comment above the prop_compose! invocation that
references rolling.live_mut and rolling.next to use // (or relocate it inside
the prop_compose! closure) to silence the unused_doc_comments lint.
Nitpick comments:
In@crates/cardano/src/ashard/loading.rs:
- Around line 19-57: The function load_pending_rewards_range currently accepts
Option<Range> but the only caller (load_ashard) always passes
Some(range), so tighten the signature to take Range directly: change
the parameter type in load_pending_rewards_range and update callers (e.g., in
load_ashard remove Some(range.clone()) and pass range.clone() directly); ensure
the body drops any pattern-matching for None (no-op) and uses the range value
when calling
state.iter_entities_typed::(PendingRewardState::NS, range).In
@crates/cardano/src/ashard/rewards.rs:
- Around line 56-77: Update the stale comment referencing "EWRAP" to reflect the
new pipeline stage: change the text around the account deregistration
explanation to say "deregistered before AShard" (or "before the boundary's
per-account leg") so it matches where the per-account registration check now
runs; the relevant area to edit is the comment above the account.is_registered()
branch in the function that calls reward.total_value() and
ctx.rewards.return_reward_to_treasury(total). Ensure the updated comment keeps
the existing clarification about pre-filtered accounts and returned_rewards
behavior.In
@crates/cardano/src/ashard/shard.rs:
- Around line 37-60: The function shard_key_range uses debug_assert! to check
validate_total_shards(total_shards) and shard_index < total_shards, which leaves
a division-by-zero and silent truncation in release; change the debug_assert!
checks to real runtime checks (e.g.,
assert!(validate_total_shards(total_shards).is_ok()) and assert!(shard_index <
total_shards)) or convert the function to return Result and propagate
validate_total_shards() error from shard_key_range so PREFIX_SPACE /
total_shards cannot divide by zero and invalid total_shards are handled
deterministically; update callers if you choose the Result approach and keep
references to shard_key_range, validate_total_shards, and PREFIX_SPACE when
making the change.In
@crates/cardano/src/ashard/work_unit.rs:
- Around line 62-129: The load/commit_state duplication and drift risk can be
fixed by capturing the computed total_shards and range during load and reusing
them in commit_state instead of re-reading state: after computing total_shards
and range in load (before calling BoundaryWork::load_ashard), store the values
on the work unit (e.g., add fields self.total_shards and/or self.range and
assign them there), and then change commit_state to take the boundary via
self.boundary.as_mut() and use the stored range/total_shards when calling
boundary.commit_ashard::(...) rather than recomputing via load_epoch; update
BoundaryWork usage only to consume the already-stored range if needed.In
@crates/cardano/src/ewrap/commit.rs:
- Around line 91-100: The current call to stream_and_apply_namespace::<D,
AccountState>(state, &writer, None) performs a full-table scan of AccountState
each boundary; instead implement a targeted path that iterates over
self.applied_mir_credentials (or the credentials from queued AssignRewards
deltas), point-reads each AccountState, applies the MIR delta, and writes the
result via the same writer so only affected accounts are touched; keep
stream_and_apply_namespace for full-set namespaces like
Pool/DRep/Proposal/EpochState, and ensure the new code reuses the existing
apply/serialize logic used by stream_and_apply_namespace to maintain invariants
and error handling.In
@crates/cardano/src/ewrap/loading.rs:
- Around line 280-285: BoundaryWork's shard_applied_effective /
shard_applied_unspendable_to_treasury / shard_applied_unspendable_to_reserves
are dead for the Ewrap path (zeroed in new_empty and never read by Ewrap; real
per-shard accumulators live in EpochState.end via EpochEndAccumulate and are
consumed by wrapup.flush), so fix by either splitting the dual-purpose struct
into two clear types (e.g., AShardWork and EwrapWork) and move the AShard-only
shard_applied_* fields into AShardWork, or add explicit documentation/comments
on BoundaryWork and those three fields noting they are owned/used only by the
AShard path; update new_empty, any constructors, and usages of BoundaryWork
(including references in EpochEndAccumulate and wrapup.flush) to use the new
types or to reflect the documented ownership to prevent future misuse.In
@crates/cardano/src/lib.rs:
- Around line 284-341: Extract and reuse the epoch load and the
effective-account-shards logic: in initialize(), call load_epoch::(state)
once and store the resulting EpochState (if Ok) for the warnings; replace the
inline computation of effective_account_shards with a call to the existing
helper read_effective_account_shards (or lift that helper to a free fn if
necessary) so the "persisted total else config" rule is implemented in one
place; update references to use the single loaded epoch value for the
tracing::warn! blocks and derive effective_account_shards from
read_effective_account_shards(state, &config) (or equivalent) to remove the
duplicate load_epoch::(state) call.- Around line 404-408: The code refreshes self.effective_account_shards on every
pop_work by calling read_effective_account_shards::(domain.state()), but this
value only matters when the epoch boundary handoff
(EpochState.ashard_progress.total) can change; avoid the repeated read by first
checking whether ashard_progress has changed and only calling
read_effective_account_shards when it has. Add a small cached marker on the
struct (e.g., last_ashard_total or last_ashard_progress) and in pop_work compare
domain.state().ashard_progress.total (or the appropriate accessor) to that
cached value; if different, call read_effective_account_shards and update both
effective_account_shards and the cached marker, otherwise skip the read and keep
the existing effective_account_shards.In
@crates/cardano/src/work.rs:
- Around line 276-279: Add a new unit test that sets TEST_TOTAL_SHARDS > 1
(instead of the current const TEST_TOTAL_SHARDS = 1) to exercise the multi-shard
sequencing in the pop_work logic: drive the WorkBuffer/state machine (invoking
pop_work) and assert that it emits exactly N consecutive AShard units (N ==
TEST_TOTAL_SHARDS) before producing an EwrapBoundary (or EWrap/Ewrap as used in
this code), and verify that last_point_seen and any cursor invariants are
preserved across the transition; target the test to exercise the pop_work path
that contains the next_index >= total_shards branch and use the existing
helpers/setup used by the other tests so it integrates with the same state
machine and coverage.- Around line 120-143: The code currently treats account_shards == 0 as a valid
"no shards" path in on_ewrap_boundary (and similarly in the PreEwrapBoundary arm
of pop_work), which contradicts validate_total_shards that requires
account_shards >= 1; change the behavior to fail-fast: remove the special-case
branch that returns EwrapBoundary when account_shards == 0 and instead assert or
panic if account_shards < 1 (e.g., add assert!(account_shards >= 1) at the start
of on_ewrap_boundary), keeping the normal Restart -> AShardingBoundary path for
valid shard counts; alternatively, if the intended semantics are that 0 means
"no shards", update validate_total_shards to accept 0 and document that
invariant—pick one consistent approach and apply the same fix in the
PreEwrapBoundary/pop_work code paths so both agree with validate_total_shards.In
@crates/cardano/work_units.md:
- Around line 5-11: The fenced code block containing the epoch diagram starting
with "Estart → Roll … → Rupd → Roll … → AShard ×N → Ewrap" is missing
a language tag (MD040); fix it by adding a language identifier (use "text")
immediately after the opening triple backticks so the block becomes ```text and
the diagram passes markdownlint-cli2 validation.In
@tests/epoch_pots/main.rs:
- Around line 500-513: Collapse the two separate
if let Some(boundary)checks
into a single binding forboundarywhen matching
CardanoWorkUnit::AShard(shard): inside oneif let Some(boundary) = shard.boundary()block first checkif shard.shard_index() == 0to clear
accumulated_appliedand setaccumulated_ending_epoch = Some(boundary.ending_state().number), then unconditionally call
accumulated_applied.extend(boundary.applied_rewards.iter().cloned()); this
removes the duplicated boundary lookup and makes the reset-and-extend logic
contiguous.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `13b48d06-fb3d-459b-b0a0-59ac2b808397` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 885094deed721abaa8bca14087703bd7ea409c28 and 8174d2a832bd81baa5228d8b3059b0b2e0bfd7ba. </details> <details> <summary>📒 Files selected for processing (23)</summary> * `crates/cardano/src/ashard/commit.rs` * `crates/cardano/src/ashard/loading.rs` * `crates/cardano/src/ashard/mod.rs` * `crates/cardano/src/ashard/rewards.rs` * `crates/cardano/src/ashard/shard.rs` * `crates/cardano/src/ashard/work_unit.rs` * `crates/cardano/src/ewrap/commit.rs` * `crates/cardano/src/ewrap/loading.rs` * `crates/cardano/src/ewrap/mod.rs` * `crates/cardano/src/ewrap/work_unit.rs` * `crates/cardano/src/ewrap/wrapup.rs` * `crates/cardano/src/genesis/mod.rs` * `crates/cardano/src/lib.rs` * `crates/cardano/src/model/epochs.rs` * `crates/cardano/src/model/mod.rs` * `crates/cardano/src/model/pending.rs` * `crates/cardano/src/rupd/work_unit.rs` * `crates/cardano/src/work.rs` * `crates/cardano/work_units.md` * `crates/core/src/config.rs` * `skills/debug-epoch-mismatch/SKILL.md` * `tests/epoch_pots/main.rs` * `tests/memory.rs` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| let end = entity | ||
| .end | ||
| .as_mut() | ||
| .expect("ESTART seeded EpochState.end before shards run"); |
There was a problem hiding this comment.
apply panics if EpochState.end is None — fragile coupling to Genesis seeding.
The expect("ESTART seeded EpochState.end before shards run") relies on Genesis seeding end = Some(EndStats::default()) for the first boundary, and on EpochTransition re-seeding it on every subsequent transition. This is correct on the happy path but turns any future code path that constructs an EpochState with end = None (test fixtures, migration tools, recovery utilities, a genesis bootstrap regression) into a hard panic deep inside delta application.
Cheaper to defend: initialize end to EndStats::default() if absent rather than panic. The EpochWrapUp delta later overwrites it with the final stats, so semantics are preserved.
🛡️ Suggested change
- let end = entity
- .end
- .as_mut()
- .expect("ESTART seeded EpochState.end before shards run");
+ let end = entity.end.get_or_insert_with(EndStats::default);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let end = entity | |
| .end | |
| .as_mut() | |
| .expect("ESTART seeded EpochState.end before shards run"); | |
| let end = entity.end.get_or_insert_with(EndStats::default); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/cardano/src/model/epochs.rs` around lines 737 - 740, The apply path
currently does entity.end.as_mut().expect("ESTART seeded EpochState.end before
shards run") which panics if EpochState.end is None; instead ensure end is
initialized to EndStats::default() when missing (so subsequent EpochWrapUp can
overwrite it). Update the apply code in crates/cardano/src/model/epochs.rs to
replace the expect with a safe initialization (e.g., use get_or_insert_with /
get_or_insert or similar) so entity.end becomes Some(EndStats::default()) if it
was None before mutating it.
apply has three early-return guards (idempotent repeat, out-of-order, total_shards mismatch) that leave state untouched. undo unconditionally subtracted the deltas and overwrote ashard_progress, so a rollback following a skipped apply would underflow the u64 end.* fields and clobber the cursor. Capture prev_ashard_progress and set an applied flag during apply only when state is actually mutated; undo early-returns when !applied and restores from the snapshot. Same pattern as EpochWrapUp/EpochStatsUpdate. Also broaden any_epoch_state to vary ashard_progress so the existing roundtrip proptests for EpochWrapUp and EpochTransition exercise the Some(_) → None → Some(_) path their apply/undo introduced earlier in this branch (previously only None → None was covered). Add a dedicated epoch_end_accumulate_roundtrip proptest covering all four progress shapes and all three skip branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The divides-256 invariant on account_shards is enforced only via debug_assert! in shard_key_range(), which is stripped in release builds. An invalid TOML value (0, 3, 7, 100, ...) would deserialize cleanly and silently corrupt key-range coverage. Call validate_total_shards() at the top of CardanoLogic::initialize and surface failures as ChainError::InvalidConfig so misconfiguration fails the startup with a clear message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After this branch's restructure, per-account reward application, unspendable routing, and EWRAP-time registration filtering live in AShard; only MIRs, refunds, and boundary close remain in Ewrap. Update the Classify, Work Units, Source Files, and Instrumentation tables so the bisection workflow points at the right work unit (and module path) for each failure shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/cardano/src/lib.rs (1)
250-258: Minor: deduplicate effective-shard-count computation.The body of
read_effective_account_shards(lines 253-258) is identical to the inline block at lines 345-348 ininitialize. SinceSelfisn't yet constructed there, the method form can't be reused, but extracting a free helper (orSelf::compute_effective_account_shards<D>(state, &config)) would keep both call sites in sync if the fallback rule ever changes.♻️ Suggested refactor
- /// Compute the effective `account_shards` value: stored - /// `ashard_progress.total` if a boundary is in flight, otherwise the - /// configured value. - fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 { - load_epoch::<D>(state) - .ok() - .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total)) - .unwrap_or_else(|| self.config.account_shards()) - } + /// Compute the effective `account_shards` value: stored + /// `ashard_progress.total` if a boundary is in flight, otherwise the + /// configured value. + fn compute_effective_account_shards<D: Domain>(state: &D::State, config: &CardanoConfig) -> u32 { + load_epoch::<D>(state) + .ok() + .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total)) + .unwrap_or_else(|| config.account_shards()) + } + + fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 { + Self::compute_effective_account_shards::<D>(state, &self.config) + }Then at line 345-348:
- let effective_account_shards = load_epoch::<D>(state) - .ok() - .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total)) - .unwrap_or_else(|| config.account_shards()); + let effective_account_shards = + Self::compute_effective_account_shards::<D>(state, &config);Also applies to: 343-348
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 250 - 258, The effective-shard-count computation is duplicated between the method read_effective_account_shards and the inline block in initialize; extract the logic into a single helper (either a free function like compute_effective_account_shards::<D>(state, &config) or an associated fn Self::compute_effective_account_shards::<D>(state, &config)) that returns the u32 by loading the epoch and falling back to config.account_shards(), then replace both the body of read_effective_account_shards and the inline compute in initialize to call that helper so both call sites stay in sync if the fallback changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/cardano/src/lib.rs`:
- Around line 250-258: The effective-shard-count computation is duplicated
between the method read_effective_account_shards and the inline block in
initialize; extract the logic into a single helper (either a free function like
compute_effective_account_shards::<D>(state, &config) or an associated fn
Self::compute_effective_account_shards::<D>(state, &config)) that returns the
u32 by loading the epoch and falling back to config.account_shards(), then
replace both the body of read_effective_account_shards and the inline compute in
initialize to call that helper so both call sites stay in sync if the fallback
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3543b5af-c9cb-43d0-942e-a2c733a526c3
📒 Files selected for processing (4)
crates/cardano/src/lib.rscrates/cardano/src/model/epochs.rscrates/core/src/lib.rsskills/debug-epoch-mismatch/SKILL.md
✅ Files skipped from review due to trivial changes (1)
- crates/core/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- skills/debug-epoch-mismatch/SKILL.md
Summary
Restructures the EWRAP epoch-boundary pipeline across six commits:
948d472e, pre-existing) — initial split of the monolithic EWRAP into three phases.EwrapPrepare→Ewrap, seedEpochState.endin ESTART (9627c3ae) — drop the misleading "Prepare" name; ESTART'sEpochTransitionnow opens theendslot.EwrapShard→AccountShard(ae260792) — name the per-account work unit by what it processes, not its position in the pipeline.AccountShardruns beforeEwrap(eaf8e08c) — settle per-account effects first, then run the global Ewrap visitors.EpochEndInitbecomes a PATCH delta;EpochEndAccumulateacceptsewrap_progress = Noneas the natural shard-0 starting state.EwrapFinalizeintoEwrap; dropEpochEndInit(a1e0d5cf) — collapse the redundant finalize phase into Ewrap; the wrap-up visitor now assembles the fullEndStats(prepare-time fields + shard accumulators) and emits a singleEpochWrapUp. One fewer state-machine state, one fewer delta type, one fewer commit cycle. Boundary close is now atomic in a single state-writer commit.EpochState.endin Genesis (5122e075) — restores the invariant thatend = Some(...)whenever anAccountShardruns (the first boundary's shards otherwise hit a fresh genesis state withend = None).Final pipeline:
Estart → Roll … → Rupd → Roll … → AccountShard ×N → Ewrap(where Ewrap now both runs the global visitors and closes the boundary).Test plan
cargo check -p dolos-cardanocleancargo test -p dolos-cardano— all 98 unit tests pass (work-buffer state-machine, epoch delta proptests, etc.)cargo build --tests— full workspace compilescargo test --test memorypassescargo test --test epoch_pots --release— gold-standard end-to-end against DBSync ground truth (requires populated Mithril test instance)5122e075resolved the original panic🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
account_shardsconfiguration (defaults to 16) to control shard parallelism.Improvements
Tests & Docs